Instrumental Variable
   HOME

TheInfoList



OR:

In
statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
,
econometrics Econometrics is the application of Statistics, statistical methods to economic data in order to give Empirical evidence, empirical content to economic relationships.M. Hashem Pesaran (1987). "Econometrics," ''The New Palgrave: A Dictionary of ...
,
epidemiology Epidemiology is the study and analysis of the distribution (who, when, and where), patterns and determinants of health and disease conditions in a defined population. It is a cornerstone of public health, and shapes policy decisions and evidenc ...
and related disciplines, the method of instrumental variables (IV) is used to estimate
causal relationships Causality (also referred to as causation, or cause and effect) is influence by which one event, process, state, or object (''a'' ''cause'') contributes to the production of another event, process, state, or object (an ''effect'') where the cau ...
when
controlled experiment A scientific control is an experiment or observation designed to minimize the effects of variables other than the independent variable (i.e. confounding variables). This increases the reliability of the results, often through a comparison betw ...
s are not feasible or when a treatment is not successfully delivered to every unit in a randomized experiment. Intuitively, IVs are used when an explanatory variable of interest is correlated with the error term, in which case
ordinary least squares In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory variables) by the prin ...
and
ANOVA Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures (such as the "variation" among and between groups) used to analyze the differences among means. ANOVA was developed by the statistician ...
give biased results. A valid instrument induces changes in the explanatory variable but has no independent effect on the dependent variable, allowing a researcher to uncover the causal effect of the explanatory variable on the dependent variable. Instrumental variable methods allow for
consistent In classical deductive logic, a consistent theory is one that does not lead to a logical contradiction. The lack of contradiction can be defined in either semantic or syntactic terms. The semantic definition states that a theory is consistent i ...
estimation when the explanatory variables (covariates) are
correlated In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...
with the error terms in a regression model. Such correlation may occur when: # changes in the dependent variable change the value of at least one of the
covariate Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or deman ...
s ("reverse" causation), # there are omitted variables that affect both the dependent and independent variables, or # the covariates are subject to non-random measurement error. Explanatory variables that suffer from one or more of these issues in the context of a regression are sometimes referred to as
endogenous Endogenous substances and processes are those that originate from within a living system such as an organism, tissue, or cell. In contrast, exogenous substances and processes are those that originate from outside of an organism. For example, es ...
. In this situation,
ordinary least squares In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory variables) by the prin ...
produces biased and inconsistent estimates. However, if an ''instrument'' is available, consistent estimates may still be obtained. An instrument is a variable that does not itself belong in the explanatory equation but is correlated with the
endogenous Endogenous substances and processes are those that originate from within a living system such as an organism, tissue, or cell. In contrast, exogenous substances and processes are those that originate from outside of an organism. For example, es ...
explanatory variables, conditionally on the value of other covariates. In linear models, there are two main requirements for using IVs: * The instrument must be correlated with the endogenous explanatory variables, conditionally on the other covariates. If this correlation is strong, then the instrument is said to have a strong first stage. A weak correlation may provide misleading inferences about parameter estimates and standard errors. * The instrument cannot be correlated with the error term in the explanatory equation, conditionally on the other covariates. In other words, the instrument cannot suffer from the same problem as the original predicting variable. If this condition is met, then the instrument is said to satisfy the exclusion restriction.


History

First use of an instrument variable occurred in a 1928 book by Philip G. Wright, best known for his excellent description of the production, transport and sale of vegetable and animal oils in the early 1900s in the United States, while in 1945,
Olav Reiersøl Olav Reiersøl (28 June 1908 – 14 February 2001) was a Norwegian statistician and econometrician, who made several substantial contributions to econometrics and statistics. His works on identifiability and instrumental variables are standard re ...
applied the same approach in the context of
errors-in-variables models In statistics, errors-in-variables models or measurement error models are regression models that account for measurement errors in the independent variables. In contrast, standard regression models assume that those regressors have been measured e ...
in his dissertation, giving the method its name. Wright attempted to determine the supply and demand for butter using panel data on prices and quantities sold in the United States. The idea was that a regression analysis could produce a demand or supply curve because they are formed by the path between prices and quantities demanded or supplied. The problem was that the observational data did not form a demand or supply curve as such, but rather a cloud of point observations that took different shapes under varying market conditions. It seemed that making deductions from the data remained elusive. The problem was that price affected both supply and demand so that a function describing only one of the two could not be constructed directly from the observational data. Wright correctly concluded that he needed a variable that correlated with either demand or supply but not both – that is, an instrumental variable. After much deliberation, Wright decided to use regional rainfall as his instrumental variable: he concluded that rainfall affected grass production and hence milk production and ultimately butter supply, but not butter demand. In this way he was able to construct a regression equation with only the instrumental variable of price and supply.Wooldridge, J.: ''Introductory Econometrics''. South-Western, Scarborough, Kanada, 2009.


Theory

While the ideas behind IV extend to a broad class of models, a very common context for IV is in linear regression. Traditionally, an instrumental variable is defined as a variable ''Z'' that is correlated with the independent variable ''X'' and uncorrelated with the "error term" U in the linear equation : Y = X \beta + U Y is a vector. X is a matrix, usually with a column of ones and perhaps with additional columns for other covariates. Consider how an instrument allows \beta to be recovered. Recall that OLS solves for \widehat such that \operatorname(X,\widehat U) = 0 (when we minimize the sum of squared errors, \min_\beta (Y- X\beta)'(Y- X\beta) , the first-order condition is exactly X' (Y- X\widehat) = X' \widehat = 0 .) If the true model is believed to have \operatorname(X,U) \neq 0 due to any of the reasons listed above—for example, if there is an omitted variable which affects both X and Y separately—then this OLS procedure will ''not'' yield the causal impact of X on Y. OLS will simply pick the parameter that makes the resulting errors appear uncorrelated with X. Consider for simplicity the single-variable case. Suppose we are considering a regression with one variable and a constant (perhaps no other covariates are necessary, or perhaps we have partialed out any other relevant covariates): :y=\alpha + \beta x + u In this case, the coefficient on the regressor of interest is given by \widehat= \frac . Substituting for y gives : \begin \widehat & = \frac = \frac \\ pt& =\frac +\frac= \beta^* + \frac, \end where \beta^* is what the estimated coefficient vector would be if ''x'' were not correlated with ''u''. In this case, it can be shown that \beta^* is an unbiased estimator of \beta . If \operatorname(x,u) \neq 0 in the underlying model that we believe, then OLS gives a coefficient which does ''not'' reflect the underlying causal effect of interest. IV helps to fix this problem by identifying the parameters not based on whether x is uncorrelated with u, but based on whether another variable z is uncorrelated with u. If theory suggests that z is related to x (the first stage) but uncorrelated with u (the exclusion restriction), then IV may identify the causal parameter of interest where OLS fails. Because there are multiple specific ways of using and deriving IV estimators even in just the linear case (IV, 2SLS, GMM), we save further discussion for the
Estimation Estimation (or estimating) is the process of finding an estimate or approximation, which is a value that is usable for some purpose even if input data may be incomplete, uncertain, or unstable. The value is nonetheless usable because it is der ...
section below.


Example

Informally, in attempting to estimate the causal effect of some variable ''X'' on another ''Y'', an instrument is a third variable ''Z'' which affects ''Y'' only through its effect on ''X''. For example, suppose a researcher wishes to estimate the causal effect of smoking on general health. Correlation between health and smoking does not imply that smoking causes poor health because other variables, such as depression, may affect both health and smoking, or because health may affect smoking. It is at best difficult and expensive to conduct controlled experiments on smoking status in the general population. The researcher may attempt to estimate the causal effect of smoking on health from observational data by using the tax rate for tobacco products as an instrument for smoking. The tax rate for tobacco products is a reasonable choice for an instrument because the researcher assumes that it can only be correlated with health through its effect on smoking. If the researcher then finds tobacco taxes and state of health to be correlated, this may be viewed as evidence that smoking causes changes in health. Angrist and
Krueger Krüger, Krueger or Kruger (without the umlaut Ü) are German surnames originating from '' Krüger'', meaning tavern-keeper in Low German and potter in Central German and Upper German. The last name Krüger with umlaut dots is widespread in Ger ...
(2001) present a survey of the history and uses of instrumental variable techniques.


Graphical definition

Of course, IV techniques have been developed among a much broader class of non-linear models. General definitions of instrumental variables, using counterfactual and graphical formalism, were given by Pearl (2000; p. 248). The graphical definition requires that ''Z'' satisfy the following conditions: : (Z \perp\!\!\!\perp Y)_ \qquad(Z \not\!\! X)_G where \perp\!\!\!\perp stands for ''d''-separation and G_ stands for the
graph Graph may refer to: Mathematics *Graph (discrete mathematics), a structure made of vertices and edges **Graph theory, the study of such graphs and their properties *Graph (topology), a topological space resembling a graph in the sense of discre ...
in which all arrows entering ''X'' are cut off. The counterfactual definition requires that ''Z'' satisfies : (Z \perp\!\!\!\perp Y_x)\qquad (Z \not\!\! X) where ''Y''''x'' stands for the value that ''Y'' would attain had ''X'' been ''x'' and \perp\!\!\!\perp stands for independence. If there are additional covariates ''W'' then the above definitions are modified so that ''Z'' qualifies as an instrument if the given criteria hold conditional on ''W''. The essence of Pearl's definition is: # The equations of interest are "structural," not "regression". # The error term ''U'' stands for all exogenous factors that affect ''Y'' when ''X'' is held constant. # The instrument ''Z'' should be independent of ''U.'' # The instrument ''Z'' should not affect ''Y'' when ''X'' is held constant (exclusion restriction). # The instrument ''Z'' should not be independent of ''X.'' These conditions do not rely on specific functional form of the equations and are applicable therefore to nonlinear equations, where ''U'' can be non-additive (see Non-parametric analysis). They are also applicable to a system of multiple equations, in which ''X'' (and other factors) affect ''Y'' through several intermediate variables. An instrumental variable need not be a cause of ''X''; a proxy of such cause may also be used, if it satisfies conditions 1–5. The exclusion restriction (condition 4) is redundant; it follows from conditions 2 and 3.


Selecting suitable instruments

Since ''U'' is unobserved, the requirement that ''Z'' be independent of ''U'' cannot be inferred from data and must instead be determined from the model structure, i.e., the data-generating process.
Causal graphs In statistics, econometrics, epidemiology, genetics and related disciplines, causal graphs (also known as path diagrams, causal Bayesian networks or DAGs) are probabilistic graphical models used to encode assumptions about the data-generating pro ...
are a representation of this structure, and the graphical definition given above can be used to quickly determine whether a variable ''Z'' qualifies as an instrumental variable given a set of covariates ''W''. To see how, consider the following example. File:Instrumental Variable Example Effect of Tutoring 1.png, Figure 1: Proximity qualifies as an instrumental variable given Library Hours File:Instrumental Variable Example Effect of Tutoring (Edge Deleted)_2.png, Figure 2: G_, which is used to determine whether Proximity is an instrumental variable. File:Instrumental Variable Example Effect of Tutoring 2.png, Figure 3: Proximity does not qualify as an instrumental variable given Library Hours File:Instrumental Variable Example Effect of Tutoring 3.png, Figure 4: Proximity qualifies as an instrumental variable, as long as we do not include Library Hours as a covariate. Suppose that we wish to estimate the effect of a university tutoring program on grade point average (
GPA Grading in education is the process of applying standardized measurements for varying levels of achievements in a course. Grades can be assigned as letters (usually A through F), as a range (for example, 1 to 6), as a percentage, or as a numbe ...
). The relationship between attending the tutoring program and GPA may be confounded by a number of factors. Students who attend the tutoring program may care more about their grades or may be struggling with their work. This confounding is depicted in the Figures 1–3 on the right through the bidirected arc between Tutoring Program and GPA. If students are assigned to dormitories at random, the proximity of the student's dorm to the tutoring program is a natural candidate for being an instrumental variable. However, what if the tutoring program is located in the college library? In that case, Proximity may also cause students to spend more time at the library, which in turn improves their GPA (see Figure 1). Using the causal graph depicted in the Figure 2, we see that Proximity does not qualify as an instrumental variable because it is connected to GPA through the path Proximity \rightarrow Library Hours \rightarrow GPA in G_. However, if we control for Library Hours by adding it as a covariate then Proximity becomes an instrumental variable, since Proximity is separated from GPA given Library Hours in G_. Now, suppose that we notice that a student's "natural ability" affects his or her number of hours in the library as well as his or her GPA, as in Figure 3. Using the causal graph, we see that Library Hours is a collider and conditioning on it opens the path Proximity \rightarrow Library Hours \leftrightarrow GPA. As a result, Proximity cannot be used as an instrumental variable. Finally, suppose that Library Hours does not actually affect GPA because students who do not study in the library simply study elsewhere, as in Figure 4. In this case, controlling for Library Hours still opens a spurious path from Proximity to GPA. However, if we do not control for Library Hours and remove it as a covariate then Proximity can again be used an instrumental variable.


Estimation

We now revisit and expand upon the mechanics of IV in greater detail. Suppose the data are generated by a process of the form : y_i = X_i \beta + e_i, where * ''i'' indexes observations, * y_i is the ''i''-th value of the dependent variable, * X_i is a vector of the ''i''-th values of the independent variable(s) and a constant, * e_i is the ''i''-th value of an unobserved error term representing all causes of y_i other than X_i, and * \beta is an unobserved parameter vector. The parameter vector \beta is the causal effect on y_i of a one unit change in each element of X_i, holding all other causes of y_i constant. The econometric goal is to estimate \beta. For simplicity's sake assume the draws of ''e'' are uncorrelated and that they are drawn from distributions with the same
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers ...
(that is, that the errors are serially uncorrelated and
homoskedastic In statistics, a sequence (or a vector) of random variables is homoscedastic () if all its random variables have the same finite variance. This is also known as homogeneity of variance. The complementary notion is called heteroscedasticity. The s ...
). Suppose also that a regression model of nominally the same form is proposed. Given a random sample of ''T'' observations from this process, the
ordinary least squares In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory variables) by the prin ...
estimator is : \widehat_\mathrm = (X^\mathrm T X)^ X^\mathrm T y = (X^\mathrm T X)^ X^\mathrm T (X \beta + e) = \beta + (X^\mathrm T X)^ X^\mathrm T e where ''X'', ''y'' and ''e'' denote column vectors of length ''T''. This equation is similar to the equation involving \operatorname(X,y) in the introduction (this is the matrix version of that equation). When ''X'' and ''e'' are
uncorrelated In probability theory and statistics, two real-valued random variables, X, Y, are said to be uncorrelated if their covariance, \operatorname ,Y= \operatorname Y- \operatorname \operatorname /math>, is zero. If two variables are uncorrelated, there ...
, under certain regularity conditions the second term has an expected value conditional on ''X'' of zero and converges to zero in the limit, so the estimator is
unbiased Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individual, a group, ...
and consistent. When ''X'' and the other unmeasured, causal variables collapsed into the ''e'' term are correlated, however, the OLS estimator is generally biased and inconsistent for ''β''. In this case, it is valid to use the estimates to predict values of ''y'' given values of ''X'', but the estimate does not recover the causal effect of ''X'' on ''y''. To recover the underlying parameter \beta , we introduce a set of variables ''Z'' that is highly correlated with each
endogenous Endogenous substances and processes are those that originate from within a living system such as an organism, tissue, or cell. In contrast, exogenous substances and processes are those that originate from outside of an organism. For example, es ...
component of ''X'' but (in our underlying model) is not correlated with ''e''. For simplicity, one might consider ''X'' to be a ''T'' × 2 matrix composed of a column of constants and one endogenous variable, and ''Z'' to be a ''T'' × 2 consisting of a column of constants and one instrumental variable. However, this technique generalizes to ''X'' being a matrix of a constant and, say, 5 endogenous variables, with ''Z'' being a matrix composed of a constant and 5 instruments. In the discussion that follows, we will assume that ''X'' is a ''T'' × ''K'' matrix and leave this value ''K'' unspecified. An estimator in which ''X'' and ''Z'' are both ''T'' × ''K'' matrices is referred to as just-identified . Suppose that the relationship between each endogenous component ''x''''i'' and the instruments is given by : x_i = Z_i \gamma + v_i, The most common IV specification uses the following estimator: : \widehat_\mathrm = (Z^\mathrm T X)^ Z^\mathrm T y This specification approaches the true parameter as the sample gets large, so long as Z^\mathrm T e = 0 in the true model: : \widehat_\mathrm = (Z^\mathrm T X)^ Z^\mathrm T y = (Z^\mathrm T X)^ Z^\mathrm T X \beta + (Z^\mathrm T X)^ Z^\mathrm T e \rightarrow \beta As long as Z^\mathrm T e = 0 in the underlying process which generates the data, the appropriate use of the IV estimator will identify this parameter. This works because IV solves for the unique parameter that satisfies Z^\mathrm T e = 0 , and therefore hones in on the true underlying parameter as the sample size grows. Now an extension: suppose that there are more instruments than there are covariates in the equation of interest, so that ''Z'' is a ''T × M'' matrix with ''M > K''. This is often called the over-identified case. In this case, the generalized method of moments (GMM) can be used. The GMM IV estimator is : \widehat_\mathrm = (X^\mathrm T P_Z X)^X^\mathrm T P_Z y, where P_Z refers to the
projection matrix In statistics, the projection matrix (\mathbf), sometimes also called the influence matrix or hat matrix (\mathbf), maps the vector of response values (dependent variable values) to the vector of fitted values (or predicted values). It describes t ...
P_Z=Z(Z^\mathrm T Z)^Z^\mathrm T. This expression collapses to the first when the number of instruments is equal to the number of covariates in the equation of interest. The over-identified IV is therefore a generalization of the just-identified IV. Developing the \beta_\text expression: : \widehat_\mathrm = (X^\mathrm Z(Z^\mathrm Z)^Z^\mathrm X)^X^\mathrm Z(Z^\mathrm Z)^Z^\mathrm y In the just-identified case, we have as many instruments as covariates, so that the dimension of ''X'' is the same as that of ''Z''. Hence, X^\mathrm Z, Z^\mathrm Z and Z^\mathrmX are all squared matrices of the same dimension. We can expand the inverse, using the fact that, for any invertible ''n''-by-''n'' matrices A and B, (AB)−1 = B−1A−1 (see Invertible matrix#Properties): : \begin \widehat_\mathrm &= (Z^\mathrm X)^(Z^\mathrm Z)(X^\mathrm Z)^X^\mathrm Z(Z^\mathrm Z)^Z^\mathrm y\\ &= (Z^\mathrm X)^(Z^\mathrm Z)(Z^\mathrm Z)^Z^\mathrm y\\ &=(Z^\mathrm X)^Z^\mathrmy \\ &=\widehat_\mathrm \end Reference: see Davidson and Mackinnnon (1993) There is an equivalent under-identified estimator for the case where ''m < k''. Since the parameters are the solutions to a set of linear equations, an under-identified model using the set of equations Z'v = 0 does not have a unique solution.


Interpretation as two-stage least squares

One computational method which can be used to calculate IV estimates is two-stage least squares (2SLS or TSLS). In the first stage, each explanatory variable that is an endogenous covariate in the equation of interest is regressed on all of the exogenous variables in the model, including both exogenous covariates in the equation of interest and the excluded instruments. The predicted values from these regressions are obtained: Stage 1: Regress each column of X on Z, ( X = Z \delta + \text ): : \widehat=(Z^\mathrm Z)^Z^\mathrmX, \, and save the predicted values: : \widehat= Z\widehat = X = X.\, In the second stage, the regression of interest is estimated as usual, except that in this stage each endogenous covariate is replaced with the predicted values from the first stage: Stage 2: Regress Y on the predicted values from the first stage: : Y = \widehat X \beta + \mathrm,\, which gives : \beta_\text = \left(X^\mathrm X\right)^ X^\mathrmY. This method is only valid in linear models. For categorical endogenous covariates, one might be tempted to use a different first stage than ordinary least squares, such as a
probit model In statistics, a probit model is a type of regression where the dependent variable can take only two values, for example married or not married. The word is a portmanteau, coming from ''probability'' + ''unit''. The purpose of the model is to est ...
for the first stage followed by OLS for the second. This is commonly known in the econometric literature as the ''forbidden regression'', because second-stage IV parameter estimates are consistent only in special cases. The usual OLS estimator is: (\widehat X^\mathrm\widehat X)^\widehat X^\mathrmY. Replacing \widehat X = P_Z X and noting that P_Z is a symmetric and
idempotent Idempotence (, ) is the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application. The concept of idempotence arises in a number of pl ...
matrix, so that P_Z^\mathrmP_Z=P_Z P_Z = P_Z : \beta_\text = (\widehat X^\mathrm\widehat X)^\widehat X^\mathrm Y = \left(X^\mathrmP_Z^\mathrmP_Z X\right)^ X^\mathrmP_Z^\mathrmY=\left(X^\mathrmP_Z X\right)^ X^\mathrmP_ZY. The resulting estimator of \beta is numerically identical to the expression displayed above. A small correction must be made to the sum-of-squared residuals in the second-stage fitted model in order that the covariance matrix of \beta is calculated correctly.


Non-parametric analysis

When the form of the structural equations is unknown, an instrumental variable Z can still be defined through the equations: :x = g(z,u) \, :y = f(x,u) \, where f and g are two arbitrary functions and Z is independent of U. Unlike linear models, however, measurements of Z, X and Y do not allow for the identification of the average causal effect of X on Y, denoted ACE :\text = \Pr(y\mid \text(x)) = \operatorname_u (x,u) Balke and Pearl
997 Year 997 (Roman numerals, CMXCVII) was a common year starting on Friday (link will display the full calendar) of the Julian calendar. Events By place Japan * 1 February: Empress Teishi gives birth to Princess Shushi - she is the first ...
derived tight bounds on ACE and showed that these can provide valuable information on the sign and size of ACE. In linear analysis, there is no test to falsify the assumption the Z is instrumental relative to the pair (X,Y). This is not the case when X is discrete. Pearl (2000) has shown that, for all f and g, the following constraint, called "Instrumental Inequality" must hold whenever Z satisfies the two equations above: :\max_x \sum_y max_z \Pr(y,x\mid z)leq 1.


Interpretation under treatment effect heterogeneity

The exposition above assumes that the causal effect of interest does not vary across observations, that is, that \beta is a constant. Generally, different subjects will respond in different ways to changes in the "treatment" ''x''. When this possibility is recognized, the average effect in the population of a change in ''x'' on ''y'' may differ from the effect in a given subpopulation. For example, the average effect of a job training program may substantially differ across the group of people who actually receive the training and the group which chooses not to receive training. For these reasons, IV methods invoke implicit assumptions on behavioral response, or more generally assumptions over the correlation between the response to treatment and propensity to receive treatment. The standard IV estimator can recover
local average treatment effect In econometrics and related fields, the local average treatment effect (LATE), also known as the complier average causal effect (CACE), is the effect of a treatment for subjects who comply with the treatment assigned to their sample group. It is n ...
s (LATE) rather than average treatment effects (ATE). Imbens and Angrist (1994) demonstrate that the linear IV estimate can be interpreted under weak conditions as a weighted average of local average treatment effects, where the weights depend on the elasticity of the endogenous regressor to changes in the instrumental variables. Roughly, that means that the effect of a variable is only revealed for the subpopulations affected by the observed changes in the instruments, and that subpopulations which respond most to changes in the instruments will have the largest effects on the magnitude of the IV estimate. For example, if a researcher uses presence of a land-grant college as an instrument for college education in an earnings regression, she identifies the effect of college on earnings in the subpopulation which would obtain a college degree if a college is present but which would not obtain a degree if a college is not present. This empirical approach does not, without further assumptions, tell the researcher anything about the effect of college among people who would either always or never get a college degree regardless of whether a local college exists.


Weak instruments problem

As Bound, Jaeger, and Baker (1995) note, a problem is caused by the selection of "weak" instruments, instruments that are poor predictors of the endogenous question predictor in the first-stage equation. In this case, the prediction of the question predictor by the instrument will be poor and the predicted values will have very little variation. Consequently, they are unlikely to have much success in predicting the ultimate outcome when they are used to replace the question predictor in the second-stage equation. In the context of the smoking and health example discussed above, tobacco taxes are weak instruments for smoking if smoking status is largely unresponsive to changes in taxes. If higher taxes do not induce people to quit smoking (or not start smoking), then variation in tax rates tells us nothing about the effect of smoking on health. If taxes affect health through channels other than through their effect on smoking, then the instruments are invalid and the instrumental variables approach may yield misleading results. For example, places and times with relatively health-conscious populations may both implement high tobacco taxes and exhibit better health even holding smoking rates constant, so we would observe a correlation between health and tobacco taxes even if it were the case that smoking has no effect on health. In this case, we would be mistaken to infer a causal effect of smoking on health from the observed correlation between tobacco taxes and health.


Testing for weak instruments

The strength of the instruments can be directly assessed because both the endogenous covariates and the instruments are observable. A common rule of thumb for models with one endogenous regressor is: the F-statistic against the
null Null may refer to: Science, technology, and mathematics Computing * Null (SQL) (or NULL), a special marker and keyword in SQL indicating that something has no value * Null character, the zero-valued ASCII character, also designated by , often use ...
that the excluded instruments are irrelevant in the first-stage regression should be larger than 10.


Statistical inference and hypothesis testing

When the covariates are exogenous, the small-sample properties of the OLS estimator can be derived in a straightforward manner by calculating moments of the estimator conditional on ''X''. When some of the covariates are endogenous so that instrumental variables estimation is implemented, simple expressions for the moments of the estimator cannot be so obtained. Generally, instrumental variables estimators only have desirable asymptotic, not finite sample, properties, and inference is based on asymptotic approximations to the sampling distribution of the estimator. Even when the instruments are uncorrelated with the error in the equation of interest and when the instruments are not weak, the finite sample properties of the instrumental variables estimator may be poor. For example, exactly identified models produce finite sample estimators with no moments, so the estimator can be said to be neither biased nor unbiased, the nominal size of test statistics may be substantially distorted, and the estimates may commonly be far away from the true value of the parameter.


Testing the exclusion restriction

The assumption that the instruments are not correlated with the error term in the equation of interest is not testable in exactly identified models. If the model is overidentified, there is information available which may be used to test this assumption. The most common test of these ''overidentifying restrictions'', called the Sargan–Hansen test, is based on the observation that the residuals should be uncorrelated with the set of exogenous variables if the instruments are truly exogenous. The Sargan–Hansen test statistic can be calculated as TR^2 (the number of observations multiplied by the
coefficient of determination In statistics, the coefficient of determination, denoted ''R''2 or ''r''2 and pronounced "R squared", is the proportion of the variation in the dependent variable that is predictable from the independent variable(s). It is a statistic used i ...
) from the OLS regression of the residuals onto the set of exogenous variables. This statistic will be asymptotically chi-squared with ''m'' − ''k'' degrees of freedom under the null that the error term is uncorrelated with the instruments.


See also

* Control function (econometrics) * Optimal instruments


References


Further reading

* * * *


Bibliography

* Wooldridge, J. (1997): Quasi-Likelihood Methods for Count Data, Handbook of Applied Econometrics, Volume 2, ed. M. H. Pesaran and P. Schmidt, Oxford, Blackwell, pp. 352–406 * Terza, J. V. (1998): "Estimating Count Models with Endogenous Switching: Sample Selection and Endogenous Treatment Effects." ''Journal of Econometrics'' (84), pp. 129–154 * Wooldridge, J. (2002): "Econometric Analysis of Cross Section and Panel Data", ''MIT Press'', Cambridge, Massachusetts.


External links


Chapter
from
Daniel McFadden Daniel Little McFadden (born July 29, 1937) is an American econometrician who shared the 2000 Nobel Memorial Prize in Economic Sciences with James Heckman. McFadden's share of the prize was "for his development of theory and methods for analyzi ...
's textbook * by
Mark Thoma Mark Allen Thoma (born December 15, 1956) is a macroeconomist and econometrician and a professor of economics at the Department of Economics of the University of Oregon. Thoma is best known as a regular columnist for ''The Fiscal Times'' throug ...
. * by Mark Thoma {{Authority control Regression analysis Simultaneous equation methods (econometrics)